G Uide a Ctor - C Ritic for C Ontinuous C Ontrol

نویسندگان

Abbas Abdolmaleki

Masashi Sugiyama

چکیده

Actor-critic methods solve reinforcement learning problems by updating a parameterized policy known as an actor in a direction that increases an estimate of the expected return known as a critic. However, existing actor-critic methods only use values or gradients of the critic to update the policy parameter. In this paper, we propose a novel actor-critic method called the guide actor-critic (GAC). GAC firstly learns a guide actor that locally maximizes the critic and then it updates the policy parameter based on the guide actor by supervised learning. Our main theoretical contributions are two folds. First, we show that GAC updates the guide actor by performing second-order optimization in the action space where the curvature matrix is based on the Hessians of the critic. Second, we show that the deterministic policy gradient method is a special case of GAC when the Hessians are ignored. Through experiments, we show that our method is a promising reinforcement learning method for continuous controls.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Einforcement L Earning through a Syn - Chronous a Dvantage a Ctor - C Ritic on a Gpu

We introduce a hybrid CPU/GPU version of the Asynchronous Advantage ActorCritic (A3C) algorithm, currently the state-of-the-art method in reinforcement learning for various gaming tasks. We analyze its computational traits and concentrate on aspects critical to leveraging the GPU’s computational power. We introduce a system of queues and a dynamic scheduling strategy, potentially helpful for ot...

متن کامل

A Direct Control Method For a Class of Nonlinear Systems Using Neural Networks

gihGpEsxpixqGFTS gmridge niversity ingineering heprtment rumpington treet gmridge gfP I inglnd wrh IWWI e diret ontrol sheme for lss of ontinuous time nonliner systems using neuE rl networks is presentedF he ojetive of ontrol is to trk desired referene signlF his ojetive is hieved through inputGoutput lineriztion of the system with neurl networksF he...

متن کامل

Improving Communication of Critical Domain Knowledge in High-Consequence Software Development: An Empirical Study

K. S. H a n k s ; U n iv e rs ity o f V irg in ia ; C h a rlo tte s v ille , V irg in ia J. C. K n ig h t; U n iv e rs ity o f V irg in ia ; C h a rlo tte s v ille , V irg in ia K e y w o rd s : re q u ire m e n ts , n a tu ra l la n g u a g e , s a fe ty-c ritic a l A b s tra c t P o o r re q u ire m e n ts a re im p lic a te d in a d is p ro p o rtio n a te n u m b e r o f d e fe c ts in s a ...

متن کامل

Ää Blockinøùöö Aeóøø× Ò Óñôùøøö Ë Blockin Blockin Blockinò Blockin

1 Nonlinear stabilization by hybrid quantized feedba k Daniel Liberzon Dept. of Ele t. Eng., Yale University New Haven, CT 06520-8267 U.S.A. daniel.liberzon yale.edu Abstra t. This paper is on erned with global asymptoti stabilization of ontinuous-time ontrol systems by means of quantized feedba k. For linear systems, a hybrid ontrol strategy for dealing with this problem was re ently proposed ...

متن کامل

G-frames and their duals for Hilbert C*-modules

Abstract. Certain facts about frames and generalized frames (g- frames) are extended for the g-frames for Hilbert C*-modules. It is shown that g-frames for Hilbert C*-modules share several useful properties with those for Hilbert spaces. The paper also character- izes the operators which preserve the class of g-frames for Hilbert C*-modules. Moreover, a necessary and suffcient condition is ob- ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2018

G Uide a Ctor - C Ritic for C Ontinuous C Ontrol

نویسندگان

چکیده

منابع مشابه

Einforcement L Earning through a Syn - Chronous a Dvantage a Ctor - C Ritic on a Gpu

A Direct Control Method For a Class of Nonlinear Systems Using Neural Networks

Improving Communication of Critical Domain Knowledge in High-Consequence Software Development: An Empirical Study

Ää Blockinøùöö Aeóøø× Ò Óñôùøøö Ë Blockin Blockin Blockinò Blockin

G-frames and their duals for Hilbert C*-modules

عنوان ژورنال:

اشتراک گذاری